Generating natural F0 trajectory with additive trees
نویسندگان
چکیده
In HMM-based TTS, while the segmental quality of synthesized speech is quite acceptable, intonation, especially at the sentence level, tends to be somewhat bland. The maximum likelihood (ML) criterion used in HMM training and parameter trajectory generation is partially responsible for the blandness. Additionally, the F0 trajectory thus generated has a smaller dynamic range than that of natural speech, and the synthesized speech does not sound lively. We propose to use multiple additive regression trees, a gradient-based, tree-boosting algorithm, for producing a more natural F0 trajectory. Multiple additive trees are trained in successive stages to minimize the error squares between natural and predicted F0 values. Additive tree modeling is integrated with MSD-HMM, which is an ideal model for characterizing the partially continuous (voiced/unvoiced) F0 contour. Experimental results in both Mandarin and English TTS trials show that the proposed approach can increase not only the dynamic range of generated F0 trajectory, but improve other objective (RMSE, correlation coefficient, voiced/unvoiced swapping errors) and subjective quality measures.
منابع مشابه
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
A F0 and voicing status estimation algorithm for speech analysis/synthesis is proposed. Instead of directly modeling speech signals, the proposed algorithm models the behavior of feature extractors under additive noise using a bank of Gaussian mixture models, trained on artificial data generated from Monte-Carlo simulations. The conditional distributions of F0 predicted by the GMMs are combined...
متن کاملIntegration of Intonation in F0 Trajectory prediction using MSD-HMMs
Present study in speech synthesis places more and more emphasis on the spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method tends to generate more continuous spectral structures than the traditional unit selection method. However, the F0 trajectory generated by HMM-based speech synthesis is often excessively smoothed and lacks prosodic variance. Th...
متن کاملImproving YANGsaf F0 Estimator with Adaptive Kalman Filter
We present improvements to the refinement stage of YANGsaf[1] (Yet ANother Glottal source analysis framework), a recently published F0 estimation algorithm by Kawahara et al., for noisy/breathy speech signals. The baseline system, based on time-warping and weighted average of multi-band instantaneous frequency estimates, is still sensitive to additive noise when none of the harmonic provide rel...
متن کاملEffects of auditory feedback on F0 trajectory generation
In this paper, a method is proposed to evaluate contributions of auditory feedback to speech F0 trajectory generation. This method is based on data obtained in a series of new auditory feedback experiments (TAF: transformed auditory feedback) in which quantitative measurements were taken of interactions between speech perception and production under natural speech conditions. Experimental resul...
متن کاملThe importance of segmental duration and f0 for generating more natural intonation in synthetic speech
This dissertation presents the importance of diphones’ duration and f0 information in generating more natural intonation in unit selection speech synthesis. The results showed that diphones’ duration or f0 information was highly correlated to one another due to the prosodic properties inherited from the recorded human speech. Also only raising the importance of duration and f0 information large...
متن کامل